Skip to content

Add basic OpenTelemetry tracing for client and server requests#2381

Merged
maxisbey merged 16 commits intomainfrom
marcelo/otel-spans
Mar 31, 2026
Merged

Add basic OpenTelemetry tracing for client and server requests#2381
maxisbey merged 16 commits intomainfrom
marcelo/otel-spans

Conversation

@Kludex
Copy link
Copy Markdown
Member

@Kludex Kludex commented Mar 31, 2026

Summary

Adds OpenTelemetry tracing to MCP client and server request handling, with W3C trace context propagation via _meta per SEP-414.

opentelemetry-api is added as a mandatory dependency. It provides a no-op tracer by default - spans are only recorded when the user configures an OTel SDK (e.g. via Logfire, Jaeger, OTLP exporters).

Spans

  • Client (send_request): MCP send tools/call my_tool with SpanKind.CLIENT
  • Server (_handle_request): MCP handle tools/call my_tool with SpanKind.SERVER

Server spans record ERROR status when the handler returns ErrorData. Client spans record errors automatically via OTel's start_as_current_span when MCPError propagates.

Trace context propagation (SEP-414)

On send, traceparent/tracestate are injected into _meta of the JSON-RPC request. On the server side, _meta is extracted and used as the parent context for the handler span. This enables cross-process trace propagation over stdio and HTTP transports.

Attributes

  • mcp.method.name - the JSON-RPC method (e.g. tools/call)
  • jsonrpc.request.id - the request ID

New files

  • src/mcp/shared/_otel.py - OTel helpers (otel_span, inject_trace_context, extract_trace_context)
  • tests/shared/test_otel.py - E2E test verifying span names, kinds, attributes, and trace propagation

Test plan

  • E2E test: client calls a tool, verifies both client and server spans with correct names, kinds, attributes, and same trace ID
  • All existing tests pass
  • 100% coverage

Kludex added 5 commits March 31, 2026 13:53
Add opentelemetry-api as an optional dependency (mcp[otel]) and create
spans for client request/response cycles and server request handling.

Span names follow the pattern:
- Client: "MCP {method} {target}" (e.g. "MCP tools/call my_tool")
- Server: "MCP handle {method} {target}"

When opentelemetry-api is not installed, tracing is a complete no-op
with zero overhead.
Simplify _otel.py by removing the ImportError fallback and lru_cache
since opentelemetry-api is now always available.
Both client and server can send requests, so use 'MCP send' for the
sending side and 'MCP handle' for the receiving side consistently.
Inject traceparent/tracestate into request _meta on send, extract it
on the server side as the parent context for the handler span. This
enables cross-process trace propagation over stdio and HTTP transports.
Comment on lines +280 to +282
# Inject W3C trace context into _meta (SEP-414).
meta: dict[str, Any] = request_data.setdefault("params", {}).setdefault("_meta", {})
inject_trace_context(meta)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: request_data.setdefault("params", {}).setdefault("_meta", {}) unconditionally adds params: {"_meta": {}} to every request, even paramless ones like ping and even when no OTel SDK is configured (where inject() is a no-op). Consider guarding the setdefault chain so _meta is only created when there is an active trace context to inject.

Extended reasoning...

What the bug is

At line 280-282 of session.py, inside send_request, the code unconditionally calls:

meta: dict[str, Any] = request_data.setdefault("params", {}).setdefault("_meta", {})
inject_trace_context(meta)

This creates both a params key and a nested _meta key on every outgoing request, regardless of whether the request originally had params or whether an OTel SDK is actually configured.

How it manifests

Consider a PingRequest, which has params=None. When serialized via model_dump(exclude_none=True), the resulting request_data dict has no params key at all. Before this PR, the wire format would be:

{"jsonrpc": "2.0", "id": 0, "method": "ping"}

After this PR, the setdefault chain inserts params: {"_meta": {}}, changing the wire format to:

{"jsonrpc": "2.0", "id": 0, "method": "ping", "params": {"_meta": {}}}

When no OTel SDK is installed (the default case, since opentelemetry-api provides only a no-op tracer), inject() writes nothing into the dict, leaving an empty _meta: {} in every request.

Why existing code doesn't prevent it

The setdefault calls are placed unconditionally inside the otel_span context manager, with no check for whether there is actually a valid span context to propagate. The inject_trace_context wrapper simply calls inject(meta) without any conditional logic.

Impact

The practical impact is low. _meta is a well-defined field in the MCP protocol, and RequestParams validates {"_meta": {}} correctly since all RequestParamsMeta fields are NotRequired. No server should break from receiving this. However, it adds unnecessary bytes to every single request on the wire, even when tracing is completely unused, and it represents an unnecessary behavioral change to the wire format.

Suggested fix

Only create the _meta dict when there is an active span context worth propagating. For example, inject into a temporary dict first and only merge it into request_data if the dict is non-empty after injection:

tmp: dict[str, Any] = {}
inject_trace_context(tmp)
if tmp:
    request_data.setdefault("params", {}).setdefault("_meta", {}).update(tmp)

Step-by-step proof

  1. Create a PingRequest with params=None.
  2. model_dump(exclude_none=True) produces {"method": "ping"}.
  3. request_data.setdefault("params", {}) inserts "params": {} and returns {}.
  4. .setdefault("_meta", {}) inserts "_meta": {} into that new params dict.
  5. inject_trace_context(meta) calls inject({}) which, with no OTel SDK, is a no-op.
  6. The final request_data is {"method": "ping", "params": {"_meta": {}}} with extra fields that were never there before.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty _meta: {} is harmless per the MCP spec. When no OTel SDK is configured, inject() is already a no-op (writes nothing). Guarding this adds complexity for no benefit.

task_support = self._experimental_handlers.task_support if self._experimental_handlers else None
# Get task metadata from request params if present
task_metadata = None
if hasattr(req, "params") and req.params is not None: # pragma: no branch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 # pragma: no branch on line 482 is incorrect — PingRequest has params: RequestParams | None = None, so when a ping arrives req.params is None and the false branch is genuinely taken. Per CLAUDE.md, this pragma should only be used for coverage.py ->exit arc misreports on nested async with, not for real conditional branches. Remove the pragma to avoid masking missing test coverage for the None-params path.

Extended reasoning...

What the bug is

The PR adds # pragma: no branch to the condition if hasattr(req, "params") and req.params is not None: at server.py:482. This pragma tells coverage.py that the false branch of this conditional is never taken, suppressing any coverage warning for the missing branch.

Why this is incorrect

PingRequest is defined as Request[RequestParams | None, ...] with params: RequestParams | None = None (see types/_types.py:589). When the server handles a ping request, req.params IS None, so the condition evaluates to False and the false branch is genuinely executed — task_metadata stays None and execution falls through to constructing the ServerRequestContext.

Step-by-step proof

  1. A client sends a PingRequest with default params=None.
  2. The server dispatches to _handle_request with this request.
  3. At line 482, hasattr(req, "params") is True (PingRequest always has a params attribute), but req.params is not None is False.
  4. The overall condition is False, so the body (task_metadata = getattr(req.params, "task", None)) is skipped.
  5. task_metadata remains None from its initialization on line 481.
  6. However, # pragma: no branch tells coverage.py this false branch never happens, which is wrong.

Project convention violated

The project's CLAUDE.md (lines 55-56) explicitly states that # pragma: no branch should only be used when coverage.py misreports the ->exit arc for nested async with blocks on Python 3.11+. This is a genuine conditional branch, not an async-with exit arc issue.

Impact

The impact is limited to test coverage reporting — this has zero runtime effect. The pragma was not present in the original code (visible in the diff) and was newly added by this PR. It could mask the fact that tests don't exercise the None-params path through this conditional, defeating the purpose of the project's 100% branch coverage requirement.

Fix

Simply remove the # pragma: no branch comment from line 482. If coverage then reports the false branch as uncovered, add a test that exercises a PingRequest (which has params=None) to cover it properly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing pragma, not introduced by this PR.

maxisbey
maxisbey previously approved these changes Mar 31, 2026
@maxisbey maxisbey enabled auto-merge (squash) March 31, 2026 20:33
@maxisbey maxisbey merged commit 37891f4 into main Mar 31, 2026
30 checks passed
@maxisbey maxisbey deleted the marcelo/otel-spans branch March 31, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants